Improving K-Means by Outlier Removal
نویسندگان
چکیده
We present an Outlier Removal Clustering (ORC) algorithm that provides outlier detection and data clustering simultaneously. The method employs both clustering and outlier discovery to improve estimation of the centroids of the generative distribution. The proposed algorithm consists of two stages. The first stage consist of purely K-means process, while the second stage iteratively removes the vectors which are far from their cluster centroids. We provide experimental results on three different synthetic datasets and three map images which were corrupted by lossy compression. The results indicate that the proposed method has a lower error on datasets with overlapping clusters than the competing methods.
منابع مشابه
Clustering with Outlier Removal
Cluster analysis and outlier detection are strongly coupled tasks in data mining area. Cluster structure can be easily destroyed by few outliers; on the contrary, the outliers are defined by the concept of cluster, which are recognized as the points belonging to none of the clusters. However, most existing studies handle them separately. In light of this, we consider the joint cluster analysis ...
متن کاملDetecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes
With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...
متن کاملEfficient Classification Technique for Outlier Detection
Outliers are the data objects that clearly differ in their behavior from the normal data. Outlier detection mainly aims at finding these data objects. Outlier detection has become the major area of research in data mining. This plays a crucial role in data mining. Most of the methods used for outlier detection, consider the positive data and their behavior, and then the data violating the behav...
متن کاملImpact of durational outlier removal from unit selection catalogs
Outlier removal is a straightforward technique for improving the quality of unit selection catalogs without hand correction. This paper investigates the use of phone durations as a criteria for removing bad units. Scoring conditioned on linguistic context demonstrably better than statistics based on phone class alone. The impact of voice modification is evaluated with a 444K utterance test corpus.
متن کاملOutlier Detection Using Enhanced K-means Clustering Algorithm and Weight Based Center Approach
ABSTRACT-In Data mining there are lots of methods are used to detect the outlier by making the clusters of data and then detect the outlier from them. In general Clustering method plays a very important role in data mining. Clustering means grouping the similar data objects together based on the characteristic they possess. Outlier Detection is an important issue in Data mining; particularly it...
متن کامل